Coordination patterns reveal online political astroturfing across the world

Online political astroturfing—hidden information campaigns in which a political actor mimics genuine citizen behavior by incentivizing agents to spread information online—has become prevalent on social media. Such inauthentic information campaigns threaten to undermine the Internet’s promise to more equitable participation in public debates. We argue that the logic of social behavior within the campaign bureaucracy and principal–agent problems lead to detectable activity patterns among the campaign’s social media accounts. Our analysis uses a network-based methodology to identify such coordination patterns in all campaigns contained in the largest publicly available database on astroturfing published by Twitter. On average, 74% of the involved accounts in each campaign engaged in a simple form of coordination that we call co-tweeting and co-retweeting. Comparing the astroturfing accounts to various systematically constructed comparison samples, we show that the same behavior is negligible among the accounts of regular users that the campaigns try to mimic. As its main substantive contribution, the paper demonstrates that online political astroturfing consistently leaves similar traces of coordination, even across diverse political and country contexts and different time periods. The presented methodology is a reliable first step for detecting astroturfing campaigns.


S1 Related literature
Gurajala et al. [3] 62 million users and their tweets crawled from Twitter not specified Pattern-matching on screen-names and tweeting times No ground truth Random sample of Twitter users (called "ground truth" in the paper) Keller et al. 2019 [4] Election campaign, South Korea, 2021 1 Network-based approach Confiscated lists of accounts published in court proceedings Regular users posting on election campaign Systematically constructed country-specific location-based random samples and hashtag-based random samples Vargas et al. 2020 [5] Information operations by state linked entities 10 Network-based approach + machine learning Accounts involved in information operations (released by Twitter) UK Parliament, US Congress, Academics, Random 2 S2 Overview of all astroturfing campaigns identified by Twitter Table S2 summarizes all data releases by Twitter until mid-September 2020. The dataset identifiers are taken from the names of the files that Twitter provided. Note that some data sets actually are part of the same campaign according to Twitter, such as the three data sets on China released in 2019. The co-tweet analysis also indicates that the Iranian data sets released in 2019 are likely part of the same campaign. We also include data on South Korea, which was never discovered or released by Twitter, but which we use in our analysis. Table S2 shows that the campaigns employed a variety of strategies. Some campaigns relied mostly on retweeting. Others made extensive use of hashtags in an attempt to get them to trend. Campaigns also differ in their propensity to link to outside material, i.e., how often they share newspaper articles or links to social contents like YouTube videos. The column % Detectable (accounts) indicates the percentage of accounts that would be classified as belonging to an astroturfing campaign based on our co-tweet or co-retweet method, meaning they either co-tweet or co-retweet with another account within a one minute time window.  Table S3 provides the same statistics for the datasets used in the main paper. The three IRA data sets targeting Germany, Russia and the U.S. were extracted from the complete IRA data set released by Twitter using the account languages German, English and Russian, respectively. The Hong Kong data set is a truncated version of the three China-related data sets released in 2019, limited to the time period relevant for the Hong Kong protests. It discards any tweets posted before April 1, 2019, as many of those accounts were apparently used for different (spamming) purposes before they became part of a more targeted, China-led campaign. The South Korea dataset is a 10% live collection of all Korean language tweets posted in June-December 2012. Table S3: Statistics of astroturfing datasets analyzed in the main paper, including number of tweets and accounts involved, percentage of retweets, tweets containing hashtags and URLs, and detectable accounts using co-(re)tweeting.   S4 Analysis of all astroturfing campaigns identified by Twitter catalonia_201906_1 catalonia_201906_1 china_052020 china_052020 china_052020 china_052020 china_052020 russia_202012 russia_202012 serbia_022020 spain_082019_1 spain_082019_1 spain_082019_1 thailand_092020 thailand_092020 thailand_092020 thailand_092020 thailand_092020 S5 Temporal patterns of astroturfing activity Figure S3 replicates Figure 1 in the main paper, but for the hashtag-based instead of the location-based random samples.  Figure S3: Comparison of hourly (left) and weekly (right) activity of astroturfing campaigns and random samples. Figure S4: Daily activity pattern of astroturfing accounts, excluding rus-sia_201906_1 and saudi_arabia_082019_1 which had a miniscule number of tweets.  Figure S5: Comparison of co-tweets between astroturfing campaigns and location-based random samples.  Figure S6: Comparison of co-retweets between astroturfing campaigns and location-based random samples.  Figure S8: Comparison of co-tweets between astroturfing campaigns and hashtag-based random samples.  Figure S9: Comparison of co-retweets between astroturfing campaigns and hashtag-based random samples.  Figure S10: Comparison of retweets between astroturfing campaigns and hashtag-based random samples.

S7 #allesindenarm -a grassroots campaign as another benchmark
In the main analysis, we used two different random samples as our benchmark and have argued that those samples were constructed to resemble the kinds of accounts that the astroturfing campaigns try to imitate. However, such a random sample may not contain many accounts that are connected with each other and therefore may be unlikely to coordinate. In contrast, grassroots movements may be initiated by groups of accounts that are aware of each other and therefore may coordinate, even in the absence of centralized instructions. But retroactively finding a suitable comparable grassroots campaign for each of our astroturfing campaigns appears impossible: it would require an in-depth study of the respective Twittersphere in the relevant language and knowledge of the information campaigns active at the time.
In order to give the reader a sense of the type of message coordination occurring during a grassroots campaign and the associated network patterns, we therefore examine a German grassroots campaign to boost the vaccination campaign against COVID-19 taking place in November 2021. Under the hashtag #allesindenArm (roughly translated as "get a jab") influencers and regular social media users posted personal messages to motivate other users to get vaccinated. However, there was also fierce resistance on Twitter with people arguing against the campaign and vaccines in general (https://www.br.de/nachrichten/netzwelt/allesindenarmwas-steckt-hinter-dem-social-trend).
Our manual inspection of the tweets posted indicated that anti-vaxx accounts and users associated with the "Querdenken" movement that mobilizes against governmental COVID-19 countermeasures hijacked the hashtag to spread their critical message. Yet we have no evidence that those accounts engaged in astroturfing -and even if they did, it would only make the test of distinguishing this genuine grassroots campaign from astroturfing harder for us. This case can be regarded as a "hard" test in other regards as well, as several celebrities were part of the campaign, giving it somewhat more central coordination that one would usually expect in a pure grassroots campaign.
In order to collect the data on this case, we gathered the around 130,000 tweets that used the hashtag #allesindenArm starting from the beginning of 2021 using the Academic Research API (the vast majority of tweets were posted on November 14 and 15). We selected the accounts most active in the campaign, i.e., those 1,707 that posted at least five tweets using the hashtag, which gives us 10% of all the around 17,000 users that used the hashtag. This selection method again makes this a harder test: the more active a user, the more likely they are to post the same message as another user.
We then proceeded to gather the last 3200 tweets that these 1,707 accounts posted, and search for instances of co-tweeting or co-retweeting. Figure S11 shows the combined co-tweet and co-retweet network, with the same thresholds applied as in our astroturfing and comparison samples. Unlike in the case of all astroturfing campaigns, there is no large network component that contains the Figure S11: Message coordination among the 1,707 accounts most involved in the grassroots campaign #allesindenArm. Isolates not shown. majority of the accounts involved. Only 320 of the 1707 accounts appear in this network that contains only 260 edges (and thus has a density of 0.005). The network more closely resembles that of the comparison samples: a few smaller components, with the vast majority of accounts (not shown) being isolates. In short: while there is some organic coordination of messages among the accounts participating in this grassroots campaign, it is much more limited than the (centralized) coordination we observe in all the astroturfing campaigns examined in this paper.
One might suspect that by selecting only 10% of the participants in this campaign -even if they are the most active ones -we are choosing a sample of unconnected users that cannot organically coordinate their messages because they aren't aware of each other -thus resulting in an underestimation of this type of message coordination in genuine grassroots campaigns. However, as Figure S12 demonstrates, a significant proportion of these 1,707 accounts (1624 nodes and 46298 edges (density 0.02) do indeed follow each other and could potentially co-tweet or copy-paste messages from each other, but do not nearly engage as frequently in these practices as astroturfing campaigns do.
While the #allesindenArm campaign is just one specific grassroots campaign, we take this as evidence that genuine grassroots campaigns do indeed Figure S12: Follower network of the 1,707 accounts most involved in the grassroots campaign #allesindenArm. display dissimilar message coordination patterns from astroturfing campaignsin fact, they seem much more similar to the comparison samples presented in the main text. This also pertains to the hourly and day-of-week patterns, as Figures S13 and S14 show: participating accounts tend to be most active after work and on the weekends, and not during office hours/days.